自动情绪识别(ER)最近由于其在许多实际应用中的潜力而引起了很多兴趣。在这种情况下,已经证明多模式方法可以通过结合多样化和互补的信息来源,从而提高性能(超过单峰方法),从而为嘈杂和缺失的方式提供了一些鲁棒性。在本文中,我们根据从视频中提取的面部和声音方式融合的尺寸ER专注于尺寸,其中探索了互补的视听(A-V)关系,以预测个人在价值空间中的情绪状态。大多数最先进的融合技术都依赖于反复的网络或常规的注意机制,这些机制无法有效利用A-V模式的互补性。为了解决这个问题,我们引入了A-V融合的联合跨注意模型,该模型在A-V模态上提取显着特征,从而可以有效利用模式间关系,同时保留模式内关系。特别是,它根据联合特征表示与单个模式的相关性计算交叉意义权重。通过将联合A-V特征表示形式部署到交叉意见模块中,它有助于同时利用内模式和模态关系,从而显着改善系统的性能,而不是香草交叉意见模块。我们提出的方法的有效性是在Recola和AffWild2数据集的挑战性视频中通过实验验证的。结果表明,我们的跨注意A-V融合模型提供了一种具有成本效益的解决方案,即使模式是嘈杂或不存在的,也可以超越最先进的方法。
translated by 谷歌翻译
多模式分析最近对情感计算的兴趣很大,因为它可以提高情感识别对孤立的单模态方法的整体准确性。多式联情绪识别最有效的技术有效地利用各种和互补的信息来源,例如面部,声带和生理方式,提供全面的特征表示。在本文中,我们专注于基于视频中提取的面部和声乐方式的融合的尺寸情感识别,其中可以捕获复杂的时空关系。大多数现有的融合技术依赖于经常性网络或传统的注意机制,这些机制没有有效地利用视听(A-V)方式的互补性质。我们介绍了一种跨关注融合方法来提取A-V模式的显着特征,允许准确地预测连续值的价值和唤醒。我们的新的跨关节A-V融合模型有效利用了模态关系。特别地,它计算跨关注权重,以专注于各个模态跨越更贡献的特征,从而组合贡献特征表示,然后将其馈送到完全连接的层以用于预测价和唤醒。所提出的方法的有效性在通过Recolat和疲劳(私人)数据集中的视频上进行了实验验证。结果表明,我们的跨关节A-V融合模型是一种经济高效的方法,优于最先进的融合方法。代码可用:\ url {https://github.com/praveena2j/cross-attentional-av-fusion}
translated by 谷歌翻译
从影片中面部表情的自动估计疼痛强度在医疗保健应用中具有巨大的潜力。然而,需要域适应(DA)以缓解通常在源和目标DO-MAINA捕获的视频数据之间发生的域移位的问题。鉴于收集和注释视频的艰苦任务,以及由于相邻强度水平的模糊而导致的主观偏见,弱监督学习(WSL)在这种应用中越来越关注。然而,大多数最先进的WSL模型通常被制定为回归问题,并且不利用强度水平之间的序数关系,也不是多个连续帧之间的时间相干关系。本文介绍了一种新的深度学习模型,用于弱监控DA,具有序数回归(WSDA-OR),目标域中的视频具有周期性的粗LA-BER。 WSDA或模型强制执行符号关系,在符号到目标序列的强度水平之间,并将多个相关帧与序列级标签关联(而不是单帧)。特别是,它通过将多个静态学习与深对抗性DA集成来学习判别和域不变特征表示,其中软高斯标签用于有效地代表来自目标域的弱序序列级标签。在Recola视频数据集中验证了所提出的方法,作为完全标记的源域,unbc-mcmaster视频数据作为弱标记的目标域。我们还验证了WSDA - 或Biovid和疲劳(私有)数据集进行序列级别估计。实验结果表明,我们的方法可以对最先进的模型提供显着改进,从而实现更大的本地化精度。
translated by 谷歌翻译
机器学习(ML)研究通常集中在模型上,而最突出的数据集已用于日常的ML任务,而不考虑这些数据集对基本问题的广度,困难和忠诚。忽略数据集的基本重要性已引起了重大问题,该问题涉及现实世界中的数据级联以及数据集驱动标准的模型质量饱和,并阻碍了研究的增长。为了解决此问题,我们提出Dataperf,这是用于评估ML数据集和数据集工作算法的基准软件包。我们打算启用“数据棘轮”,其中培训集将有助于评估相同问题的测试集,反之亦然。这种反馈驱动的策略将产生一个良性的循环,该循环将加速以数据为中心的AI。MLCommons协会将维护Dataperf。
translated by 谷歌翻译
The recent increase in public and academic interest in preserving biodiversity has led to the growth of the field of conservation technology. This field involves designing and constructing tools that utilize technology to aid in the conservation of wildlife. In this article, we will use case studies to demonstrate the importance of designing conservation tools with human-wildlife interaction in mind and provide a framework for creating successful tools. These case studies include a range of complexities, from simple cat collars to machine learning and game theory methodologies. Our goal is to introduce and inform current and future researchers in the field of conservation technology and provide references for educating the next generation of conservation technologists. Conservation technology not only has the potential to benefit biodiversity but also has broader impacts on fields such as sustainability and environmental protection. By using innovative technologies to address conservation challenges, we can find more effective and efficient solutions to protect and preserve our planet's resources.
translated by 谷歌翻译
A Digital Twin (DT) is a simulation of a physical system that provides information to make decisions that add economic, social or commercial value. The behaviour of a physical system changes over time, a DT must therefore be continually updated with data from the physical systems to reflect its changing behaviour. For resource-constrained systems, updating a DT is non-trivial because of challenges such as on-board learning and the off-board data transfer. This paper presents a framework for updating data-driven DTs of resource-constrained systems geared towards system health monitoring. The proposed solution consists of: (1) an on-board system running a light-weight DT allowing the prioritisation and parsimonious transfer of data generated by the physical system; and (2) off-board robust updating of the DT and detection of anomalous behaviours. Two case studies are considered using a production gas turbine engine system to demonstrate the digital representation accuracy for real-world, time-varying physical systems.
translated by 谷歌翻译
We consider infinite horizon Markov decision processes (MDPs) with fast-slow structure, meaning that certain parts of the state space move "fast" (and in a sense, are more influential) while other parts transition more "slowly." Such structure is common in real-world problems where sequential decisions need to be made at high frequencies, yet information that varies at a slower timescale also influences the optimal policy. Examples include: (1) service allocation for a multi-class queue with (slowly varying) stochastic costs, (2) a restless multi-armed bandit with an environmental state, and (3) energy demand response, where both day-ahead and real-time prices play a role in the firm's revenue. Models that fully capture these problems often result in MDPs with large state spaces and large effective time horizons (due to frequent decisions), rendering them computationally intractable. We propose an approximate dynamic programming algorithmic framework based on the idea of "freezing" the slow states, solving a set of simpler finite-horizon MDPs (the lower-level MDPs), and applying value iteration (VI) to an auxiliary MDP that transitions on a slower timescale (the upper-level MDP). We also extend the technique to a function approximation setting, where a feature-based linear architecture is used. On the theoretical side, we analyze the regret incurred by each variant of our frozen-state approach. Finally, we give empirical evidence that the frozen-state approach generates effective policies using just a fraction of the computational cost, while illustrating that simply omitting slow states from the decision modeling is often not a viable heuristic.
translated by 谷歌翻译
While the capabilities of autonomous systems have been steadily improving in recent years, these systems still struggle to rapidly explore previously unknown environments without the aid of GPS-assisted navigation. The DARPA Subterranean (SubT) Challenge aimed to fast track the development of autonomous exploration systems by evaluating their performance in real-world underground search-and-rescue scenarios. Subterranean environments present a plethora of challenges for robotic systems, such as limited communications, complex topology, visually-degraded sensing, and harsh terrain. The presented solution enables long-term autonomy with minimal human supervision by combining a powerful and independent single-agent autonomy stack, with higher level mission management operating over a flexible mesh network. The autonomy suite deployed on quadruped and wheeled robots was fully independent, freeing the human supervision to loosely supervise the mission and make high-impact strategic decisions. We also discuss lessons learned from fielding our system at the SubT Final Event, relating to vehicle versatility, system adaptability, and re-configurable communications.
translated by 谷歌翻译
Machine learning is the dominant approach to artificial intelligence, through which computers learn from data and experience. In the framework of supervised learning, for a computer to learn from data accurately and efficiently, some auxiliary information about the data distribution and target function should be provided to it through the learning model. This notion of auxiliary information relates to the concept of regularization in statistical learning theory. A common feature among real-world datasets is that data domains are multiscale and target functions are well-behaved and smooth. In this paper, we propose a learning model that exploits this multiscale data structure and discuss its statistical and computational benefits. The hierarchical learning model is inspired by the logical and progressive easy-to-hard learning mechanism of human beings and has interpretable levels. The model apportions computational resources according to the complexity of data instances and target functions. This property can have multiple benefits, including higher inference speed and computational savings in training a model for many users or when training is interrupted. We provide a statistical analysis of the learning mechanism using multiscale entropies and show that it can yield significantly stronger guarantees than uniform convergence bounds.
translated by 谷歌翻译
Implicit Neural Representations (INR) have recently shown to be powerful tool for high-quality video compression. However, existing works are limiting as they do not explicitly exploit the temporal redundancy in videos, leading to a long encoding time. Additionally, these methods have fixed architectures which do not scale to longer videos or higher resolutions. To address these issues, we propose NIRVANA, which treats videos as groups of frames and fits separate networks to each group performing patch-wise prediction. This design shares computation within each group, in the spatial and temporal dimensions, resulting in reduced encoding time of the video. The video representation is modeled autoregressively, with networks fit on a current group initialized using weights from the previous group's model. To further enhance efficiency, we perform quantization of the network parameters during training, requiring no post-hoc pruning or quantization. When compared with previous works on the benchmark UVG dataset, NIRVANA improves encoding quality from 37.36 to 37.70 (in terms of PSNR) and the encoding speed by 12X, while maintaining the same compression rate. In contrast to prior video INR works which struggle with larger resolution and longer videos, we show that our algorithm is highly flexible and scales naturally due to its patch-wise and autoregressive designs. Moreover, our method achieves variable bitrate compression by adapting to videos with varying inter-frame motion. NIRVANA achieves 6X decoding speed and scales well with more GPUs, making it practical for various deployment scenarios.
translated by 谷歌翻译